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ABSTRACT 


Access to information has never been easier and people’s eagerness and 
ability to publish information on social media platforms has never been higher. 
The growing mountain of information has presented an opportunity and a 
significant challenge for data scientists. The military in particular can benefit from 
the ability to use public information to gain an awareness of its current 
vulnerabilities as well as learning about its adversaries. 

This thesis explores methods for collecting public information from social 
media that may be revealing operational military movements. This research 
demonstrates that it is possible to train a machine to search for and find military 
members in social media by using publicly available information distributed by 
the military. The postings of military members, once identified, can then be 
ingested and processed in real time, allowing the timely detection of possible 
military information that had been posted in social media. 
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I. INTRODUCTION 


Technology has allowed the ways in which people communicate with one 
another to evolve rapidly. Social media platforms present a tempting and readily 
accessible way for people to share—instantly, without pause for reflection—their 
experiences, happiness, frustration, and general opinions on their lives and 
interactions with the world. One such platform, Twitter, allows users to create a 
profile and share 140-character messages called tweets. These tweets are then 
viewable by anyone in the world instantly [1]. The motivation for this research 
was to see if military users could be found in social media, and whether their 
postings could serve to tip off followers that a military movement will happen in 
the near future. 

During World War II, the idiom “loose lips sink ships” was popularized as a 
way to remind people that they should not discuss their relatives’ military 
movements with others for fear that this information could slip into the 
hands of opposing military planners. In reality, it would be difficult to find a case 
where loose lips actually did sink a ship during that war. Today, however, 
communication moves at the speed of light, allowing someone on the other side 
of the planet to react immediately to information on social media, potentially 
compromising sensitive military operations. 

The military is aware of the potential leak of information through social 
media, but little has been done to stop information from leaking. The biggest 
hurdle is figuring out how to stop information from leaking and no one seems to 
have a good answer. Most leaders point towards training and awareness, as 
shown in Figure 1 as the key to keeping the information off social media. 
Unfortunately, these approaches have so far been unsuccessful in stopping the 
information from leaking. 
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Figure 1. DOD Information Assurance Training 
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The DoD annual information assurance training gives guidelines to members on 
what information should and should not be posted on social networks. 

A. OBJECTIVES 

The challenges faced in securing information on social media platforms 
also presents an opportunity to the military. The military can benefit from tools 
designed to exploit the information posted on social media in the planning 
process and to make better-informed decisions based on the information. 
Exploiting social media gives operatives a new intelligence resource for which 
the cost of entry is very low. The military currently lacks a capability to exploit this 
information in real time and integrate it into the current operations and planning 
process. 
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This research explored techniques for finding military members on Twitter 
based on their profile descriptions and tweets, and to follow these users—alerting 
the author via a text message when it appeared operational military information 
had been posted. 

B. EXAMPLES 

Before embarking on this study, the author manually searched social 
media platforms to see how prevalent the leak of operational information was in 
social media. It was found that members could be found with keyword searches 
and some of the information that was shared undoubtedly represented 
operationally relevant and actionable information. 

Figures 2, 3, and 4 are examples of an enlisted sailor stationed on a 
submarine. The chronology goes from newest first to oldest last—exactly as it 
would appear on Twitter. The gray boxes attempt to hide the identity of the user. 

Figure 2. Enlisted Sailor Twitter Profile 



I am a 22 year old submariner(a- 
gang) stationed in Washington 
just tweeting what is going on in 
my life 

V bangor Washington 

The sailor self identifies as a 22-year-old enlisted member of the auxiliaries 
division aboard a submarine stationed in Bangor, Washington. 
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Figure 3. Sailor Uniform Tweets 




The sailor brags about his recent promotion to Petty Officer Second Class. He 
shows his uniform with his name, rank, and enlisted submarine warfare insignia 
reaffirming his profile information. 
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Figure 4. Sailor Pre-Deployment Tweets 



Jun S 


Last night of freedom for a couple of months I'll see you at 
the end of summer 



Last couple of beers before I 
disappear under the water for a 
couple of months 




Getting plastered before disappearing for about 4 months 


The sailor tweets that he is “disappearing under the water” insinuating that the 
submarine he is stationed aboard is getting underway. He also uses time specific 
words enabling one to estimate how long the submarine will be underway: “end 
of summer,” “couple of months,” and “about 4 months.” 
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Figure 5. Sailor Deployment Bookend Tweets 


Aug 29 


Just got back from a three month underway man is it nice to 
see life outside the sub 



23 May 2013 

Time to disappear in the morning for a couple months. 
Good bye sun I will see you towards the end of the year. 


These tweets show the start and conclusion of a deployment. 23 May: 

“disappear for a couple months” 29 Aug—3 months 6 days later—“just got back” 

He also reiterates that he was on a submarine for this deployment 

Figures 6, 7, and 8 are not case studies of a particular user, but rather 
single examples that demonstrate the kind of exploitable information that can be 
found on social media platforms. The examples show that officers and enlisted 
both leak information along with their spouses. 
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Figure 6. Facebook Submarine Sighting 
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This example is from Facebook and is a post from a U.S. Navy officer who is also 
a private pilot. While flying around Hawaii, he took these pictures of a submarine 
underway. He identifies it as a Los Angeles Class attack submarine (688) and 
gives its approximate location and heading. In the comments section, it is also 
revealed that this hull is the USS Houston (SSN-713). 
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Figure 7. Sailor Girlfriend Tweets 



Babe!!! 


OMQ 


You're lucky I'm up!!!! I just laid f 
down!! j 


Spn>l ^ t:06 PM 

6 facebooK.com 


We are close to lar>d right now 


Why are y’all close to land?? T 


On the USS Denver's Facebook.. IM SO PROUOl 
V I^^^^^^Hpic.twitter.conVI NcjpGCuuS 
Reply Retweei Favorite 


‘ mecf :i 


I am a happy girlfriend V Deployments suckP 
pic.tWftter.com/Y)dt6PlzR2 
Reply Retweet Favorite 


So I have signal 


The two posts shown are tweets by a girlfriend of an enlisted sailor. She took 
screenshots from other applications and posted them on Twitter with her own 
comments. In February, she posted the screenshot on the left of her boyfriend 
receiving a sailor of the day award aboard his ship. This post gives his name, 
rank, and ship. To the right is a screenshot of a private conversation they were 
having on an instant messaging application. Here, he discloses “we are close to 
land right now.” While this seems innocent enough, this ship is forward deployed 
to Asia, which leaves a small footprint in which it can be located. 


Figure 8. U.S. Navy Officer Twitter Profile 



I am a Lieutenant in the USN aboard 
USS NEBRASKA (SSBN 739). Aerospace 
Engineer. UltraMarathoner in training. 
Ultimate Frisbee fanatic. 


This profile is from a Twitter user. This user self identifies as a U.S. Navy 
Lieutenant—an officer stationed aboard the USS Nebraska (SSBN - 739). The 
Nebraska is a ballistic missile submarine that conducts the Nation’s most 
secretive deployments. This example shows that officers are also candidates for 
posting operationally sensitive information in social media. 
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These previous examples show that relevant military operations 
information is available on social media. The research presents a technique to 
detect operational military information in social media. It presents methods and 
software for finding military users on social media and detecting social media 
postings from these users that may possibly include information on imminent 
military movements. We define “imminent” to be a 96-hour period from receiving 
the information and “military movements” as unit operations wherein the unit 
departs, enters, or returns to a home or foreign base or port. A detection alert is 
generated when information indicating a future movement is ingested and 
processed. The detection is considered successful if the posted information of a 
user actually contains a general timeframe of the movement, and we can obtain 
the ship or unit information from this user’s profile, other social media postings, or 
social media relationships. The research will focus looking at the U.S. Navy to 
determine the best methods and approaches for finding relevant user accounts 
and ingesting their tweets in real time to alert that operational military information 
has been leaked in social media. It is expected that a successful algorithm 
developed here will be applicable towards other domains. 

C. THESIS ORGANIZATION 

The research is organized into five chapters. Chapter II presents the 
background of the research, introduces related work, and explains the 
technology used in implementing the system. Chapter III describes the specific 
approach chosen to implement the prototype. Chapter IV discusses the findings 
of the research. Lastly, Chapter V concludes the thesis and suggests a number 
of possible extensions of the thesis work. 
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II. BACKGROUND 


A. RELATED WORK 

There have been several projects internally and externally to The Naval 
Postgraduate School (NPS) with respect to social media data and specifically 
Twitter data. Internally, LT Jeremy Nauta’s thesis, Utilizing Twitter to Locate or 
Track an Object of Interest, focused on finding effective methods to utilize the 
unstructured textual information found in tweets and using that information to find 
or track a contact of interest [2]. Nauta collected and analyzed about 300,000 
tweets [2]. 

Another NPS thesis written by Kok Wah Ng, titled The Use of Twitter to 
Predict the Level of Influenza Activity in the United States collected several 
million tweets and attempted to give a better understanding of the influenza 
patterns in the United States [3]. Ng also attempted to predict when and where 
influenza outbreaks were likely to occur [3]. 

Work has also been done in the area of prediction based on Twitter 
sentiment analysis. [4] focused on ingesting tweets and attempting to predict 
stock market movement based on the inferred sentiment of the textual 
information. Research into predicting stock market movement based on Twitter 
sentiment analysis has thus far been inconclusive, however the lessons learned 
should prove valuable in this thesis. 

The University of Arizona has also done work in the area of tweet tracking 
and analysis. Their project called TweetTracker [5] has the ability to filter tweets 
based on various user-defined terms. It can then analyze the data using trend 
analysis and produce a multitude of different visualizations to help the user 
understand the data [6]. 


11 



B. SOCIAL MEDIA 


Social media refers to technology services that allow people to post 
information about a topic for other people see and interact with. There are many 
social media platforms—each with their own niche that distinguishes them from 
the others. Some are focused on videos and pictures while other are geared 
more toward giving users the ability to create a full online representation of 
themselves to include likes, dislikes, relationships, and images. 

People use social media platforms for a plethora of reasons. Some people 
use them to stay in touch with friends, others to promote themselves or a 
product. Many people use social media as their news source for current events. 
Social media platforms work by allowing users to create the content that others 
see. 

Social media platforms share a common trait that the information posted is 
generally available to other users around the world instantly. However, various 
platforms allow users to restrict who can see their information but platforms such 
as Twitter and Instagram make user’s information public by default. Facebook 
has stricter controls on who can see user data based on user associations 
between each other. Facebook also requires anyone accessing its platform to 
have an account. 

C. TWITTER ECOSYSTEM 

Twitter is a social media company based in California, USA. Their platform 
allows users to make a profile and post information for others to see and interact 
with. They charge nothing for the service rather making their income from 
targeted advertisements that appear on a user’s timeline feed along with a 
product line available to companies and advertisers. The user’s timeline feed is 
populated with posts from accounts that the user follows as shown in Figure 9. 
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Figure 9. Twitter Timeline Feed 
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An example of a user’s timeline feed. The feed is populated with content from the 
accounts the user is following. At the bottom is a promoted post. Twitter makes 
its income by showing users promoted posts from accounts they do not follow. 


1. Why Twitter 

Twitter was chosen as the social media platform to collect data for this 
thesis because it has a very user-friendly application programming interface 
(API) and because there is no special privilege required to gain access to the 
information in their databases or their streaming information. The third line of the 
Twitter privacy policy states, “What you say on the Twitter Services may be 

viewed all around the world instantly” [7]. Other platforms such as Facebook and 
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Linkedin would have been great platforms to access and analyze data for this 
research but their user agreements and API functionality precluded the scale of 
data mining that would be necessary to meet the goals stated in Chapter I. 
Facebook and Linkedin also require significant financing to access mass 
amounts of user-generated data. Along with not having the financing to buy the 
data, doing so would have violated the research operating governance set forth 
by the NPS Institutional Review Board. Without these barriers to entry, the 
technology approach explored in this thesis is applicable to the data contained in 
other social media platforms. 

2. Twitter Terminology 

The Twitter ecosystem has specific terms as detailed in [8] for various 
types of actions in the system. The research references the following selected 
terms. 


a. User ID 

A user identification (ID) is assigned to a user when they sign up for the 
service. The user ID is unique to the user and cannot be changed. The user ID is 
a numeric value. 

b. Username 

The user chooses a username when they sign up for the service. The 
username is also referred to as the user handle. The handle has to be unique but 
can be changed an unlimited number of times while the account is active. The 
handle is comprised of alphanumeric characters and underscores. 

c. Tweets 

A post on Twitter is called a tweet. A tweet is limited to 140 characters and 
can contain entities besides text. In addition to text, a tweet can contain links to 
other webpages and it can have a picture attached to it. 
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d. Hashtag 

Users also use various textual conventions that are specific to Twitter. A 
“#” attached to the front of a word is called a hashtag and allows the user to 
designate their tweet as part of a trending topic or simply to give a one word 
summary of their point. 

e. @handle 

Another convention is placing the symbol at the front of a user’s 
Twitter name to tag them in the post. 

f. ReTweets 

The last textual convention is using “RT” plus a username to signify what 
is being posted is actually a repost (retweet) of another user. 

g. Profile 

Users can create profiles for their accounts. The profile can consist of up 
to 160 characters of text, two pictures and various optional information such as 
webpage, location, and language. Twitter also attaches the account creation date 
to the profile. An example of the U.S. Navy’s account profile is shown in Figure 
10 . 


Figure 10. U.S. Navy Twitter Profile 

U.S. Navy 

@USNaw 

Official Twitter account of the #USNavy. 
(Following, RTs and links * endorsement) 

g The 7 seas! 

^ navy.mil 
O Joined July 2009 
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h. 


Friend 


A friend is an account that the user follows on Twitter. The action of 
friending an account means that the friend account’s tweets will populate the 
user’s timeline. 

i. Follower 

A follower is an account that follows the user on Twitter. To the following 
account, this action is called friending an account. The follower’s timeline will 
populate with the tweets of the user being followed but nothing will happen to the 
timeline of the account being followed. 

An account has no requirement to friend or follow any accounts. An 
account can friend or follow another account regardless of whether or not the 
other account takes any action towards them. This follows with the public nature 
of Twitter in that anything said on Twitter is publicly available. 

j. Verified User 

A verified Twitter user is an account that Twitter has confirmed belongs to 
the user the account claims to be. The anonymity of the Internet allows people to 
easily pretend to be highly recognizable brands, people, or things. Twitter 
combats fraudulent accounts by placing a blue checkmark as shown in Figure 10 
on the account page so that other users know they are interacting with the 
authentic account and not a fake account. A verified user typically has many 
followers, is famous and will not be in the military. 

D. TWITTER API 

The scripts used to extract the data from the Twitter database were all 
built for this research project using Python version 2.7 and the open source 
Python library called Tweepy. 

There are two Twitter APIs available to anyone with a Twitter account. The 

first API is the streaming API. This API allows the user to access the streaming 
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data being written by Twitter users in real time. There are several restrictions on 
the amount of data that a user can obtain for free through the streaming API. 
First, the user shall apply a filter to the data they are looking for. This filter can be 
a keyword, username, geolocation, or a combination of these among others. The 
second restriction is that a nonpaying user can only access up to one percent of 
the Twitter stream at any given time. Despite these restrictions, a massive 
amount of data can be had in a short amount of time. 

In tests of the streaming API, it was found that with a moderate number of 
filters in place, one hundred percent of the tweets that met the filter criteria would 
be displayed to the user regardless of the one percent cap. 

Several filters can be placed on the streaming API. First, a keyword filter 
can be placed on the streaming data. This filter can consist of up to 400 
keywords in an “and” or “or” relationship [5]. Next, a geolocation filter can be put 
on the API. Not all tweets have geolocation but this filter will ensure all tweets 
returned do have geolocation and plot within a bounding box. Another filter is the 
filter by user. The streaming API allows a filter of up to 5000 users [5]. This is 
essentially the same as following these users and allows a program to 
specifically access the streaming tweets of up to 5000 users without the users 
knowing their tweets are being collected by the program. 

Streaming tweets are received as a JavaScript Object Notation (JSON) 
object. The JSON object contains a wealth of information not readily visible from 
the Twitter web interface, including information about the user’s profile and the 
embedded objects in the tweet. An example of a tweet as seen on the web 
interface is show in Figure 11 and the same tweet viewed through the streaming 
API is shown in Figure 12. The only filter used to capture the tweet was the word 
“navy.” 
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Figure 11. Tweet from Twitter.com Web Interface 



The U.S. Navy relies on this 14-year-old, 
defunct technology. And it's paying millions 
for it. cnnmon.ie/ISQRRbD 


Noble Microsystems NonieMfcroTech • 2h 


View summary 


Figure 12. Raw Tweet from Twitter API 


• • • 


TwitterStuff — bash — 120x23 


{"credte(J_at"s"Fri Jun 26 23:44:25 +0080 2015*',**i(J'‘:614580003381071872,''id_str”;''61458000338l07l872",'*texf':'‘The U.S. Na 
vy relies on this 14-year-old, defunct technology. And it's paying nillions for it. http:\/\/t.co\/cnFjnzsV5c","source”: 
”\u003ca href»\”http;\/\/twitter.co«\" rel»\"nofollow\"\u0e3eTwitter Web Client\u003c\/a\u003e","truncated":false,"in_re 
at us.id": null, "in_reply.to.5t at us^id^str": null, *'in^reply^to.user_id": null, "in^reply^to^user^id_str": null," in^re 



icroTech","location":"Indlandpolis, Indiana, USA”,"url";null,"description":"Noble Microsystems is a full service Infosec 
research consultancy specializing in protecting your digital assets.","protected”:false,"verified":false,"followers_cou 
nt‘*:S2,"friends_count":220,"listed_count";2,”favourites_count":16,”statuses_coont":1812,"created_at":"Wed Oct 30 11:13:5 
9 ♦0000 2013","utc_offset":-14400,"time_zone":"Ea5tern Time (US & Canada)","geo_enabled":true,"lang":"en","contributors, 
enabled":false,"is_translator":false,"profile_bdckground_color":"C0DEEO","profile_background.image.urI":"http:\/\/abs.tw 
ing. com\/iinages\/the(iies\/thentel\/bg. png", "prof ile.backg round_iiiiage_url_ht tps": "https ;\/\/abs.twi«g.con\/images\/themes\/ 
thefliel\/bg. png", "profile.background.tile":false,"profile.link.color":"008484","profile.sidebar.border_color":"C0DEEO","p 
rofile.sidebar.fill.color":"0DEEF6","profile.text.color":"333333","profile.use.background.image":t rue,“profile.image.url 
":"http:\/\/pbs.twifflg.co«\/protile_i«ages\/606589111428747264\/Sit956n9.norfflal.png","profile.iniage.url.https":"https:\/\ 
/pbs. twimg.coni\/profile.iiiages\/606589111428747264\/Sit956n9_normal.png","profile.banner.orl":"https:\/\/pbs.twi(ig.coiB\/ 
profile.banners\/2164585274\/1433469537","defaolt_profile":true,"default_profile_image":false,"following":null,"follow.r 
equest.sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet.count":0 
,"favorite.count":0,"entities":<"hashtags":11."trends":I1,"urls":({"url":"http:\/\/t.co\/cnFjnzsV5c","expanded.url":"htt 
p:\/\/cnnmon.ie\/lSQRRbO","display.url":"cnn*on.ie\/lSORRbD","indices":[95,1171)1."user.mentions":[]."symbols":(l},"favo 
rited":false,"retweeted":false,"possibly_sensitive":false,"filter level":"low","lang":"en","timestaBip_ms":"1435362265807 
") 


■r 


The same tweet from Figure 11 as seen from the Twitter streaming API. 

Data through the streaming API can be displayed in various ways. Figure 
13 is a display of the streaming API using the same “navy” keyword filter with the 
time-date stamp and the text of the tweets as they are received. 
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Figure 13. Twitter Streaming API Displayed in Terminal 


# # # TwittsrStuff — Python — 181x23 

Sat Jun 27 »2:eS:M •*»»• 2»tS aSa^sDenia aHatatca mutktack aNavy Order SMS or M •et2139t9*ll . Line: adekunyit http://t.c»/]lT7Sfaaar 

Sat Jun 27 92:9S:20 *09M 29tS la toooooo joining tt>a navyV 

Sat Jun 27 »2:9S:2l **IM 201S Dro«en-Aeha*Zentruo der Navy entltsit SS Therapio-Tellnehaer Mtp://t.co/C3B5ntYNeF 

Sat Jun 27 P2:IS:22 •MM 301S New Hens Clasiu Harrington Boaber Jacket Retro Nod Skin Scooter Navy Wine Stack - Full re- http://t.ce/n4dy20P779 http://t.ee/lZlf42av0r 

Sat Jun 27 S2:SS:27 vOSM 2tlS « WAVY izaAA«f«tftt'CS9 *N99 - «K0JI‘'S'7(:A.va«aa^VS'tt<A«%Sa3«H(*)iCltau^. 

N99i3Ma9 f http://t.co/C3ihSuU6S 

Sat Jun 27 S2:IS:3S •MM 2StS #6/26/1979: Mavy Talk Force 7B coMteted ametweeping of North Vietnaaeie port}. 

Sat Jun 27 *2:#5:32 •PIM 2#tS Neni Polo RALPH LAUA8I Navy Blue Large Shirt Pique Cotton Large Logoi Big * - Full read by. http://i.ee/rNpiqPvaaH http://t.co/SROBOb3NtL 

I Sat Jun 27 •2:BS:33 ^BSM 2#1S RT fxayeyo Marc: CLBJAlntShlt Found Kobe's Locker rooa at the Navy Base where t work at. http://t.co/kEVSUJK3db 

Sat Jun 27 •3:BS;39 vOBM 2#1S l•<^Vhl « ' a«MSarna« *33A7l7ir>Ki/-7 77-. («•«; http://t.C 0 /FVWcaSxTTv 

Sat Jun 27 *2:BS:3S ••BM 2#1S Hens Polo RALPH LAUftat Navy Blue Large Shirt Pique Cotton Large Logos Big 4 - Full read by. http://t.co/HOfbgnSigB http://t.co/iZfWL9o3ah 

. Sat Jun 2? •2:BS:37 •BBM 2#1S New Hens Classic Harrington Boaber Jacket Retro Hod Skin Scooter Navy Wine Black - Full re. http://t.co/01tM2KS17T http://t.co/KPTZl>C2fSG 

Sat Jun 27 #2:95:3# vBBM 2#tS Hens Polo RALPH LAUREN Navy Blue Large Shirt Pique Cotton Large Logos Big 4 • Full read by. http://t.co/ld]a(H31ca http://t.co/nTZtnnnlVt 

Sat Jun 27 #3:BS:43 •##M 2#1S Hens Polo RALPH LAUROI Navy Blue Large Shirt Pique Cotton Large Logos Big 4 - Full read by- http://t.co/2lOKihUHrk http://t.co/HOee7HwNF4 

Sat Jun 27 #2:#5:44 •*#M 2#15 New Hens Classic Harrington Boaber Jacket Retro Nod Skin Scooter Navy Wine Black - Full re- http://t.ce/gCJHiRCOTh http://t.eo/rHFxdluTCs 

Sat Jun 27 #2:#S:48 •*#M 2#1S Joined the ailitary Just to bank with Navy Federal... Jokes on theov 

Sat Jun 27 #2:#5:93 •BBM 2#1S Ny Dad drrickey had his retlreaent party today...and this was hia in the *Navy during boot caap in. https://t.co/pupCyLa22R 

Sat Jun 27 #2:#S:S6 •BBM 2#t5 RT tcuraudgeonl337: fCriwi of course this navy/navel theaed belly button rings exist http://t.ce/alXhXfeCIw 

Sat Jun 27 #2:#5:S9 vdSM 2#1S Boise -9.S vs Wash, Harshall -S at Ohio. Texas vlS.S at TCU, Baaa -S.S vs Ark. NO -14 vs Navy. Texas -2 vs KSU. BYU *7.$ vs Mlii... 

Sat Jun 27 #2:#6:B# •##M 2#1S Hens Polo RALPH LAURB4 Navy Blue Large Shirt Pique Cotton Large Legos Big 4 - Full read by. http://t.co/cTOIHyJZpx http://t.eo/kTJb246o6b 

Sat Jun 27 #2:#6;B8 •##M 2#1S http://t.co/46B0#iwaky Classiques Entier Open Toed Woaen's Navy with Silver Lining Heel sue 11 

4 Sat Jun 27 #2:#6:#8 •##M 2#1S Caia Blockade Breakers Likely To Face Israeli Navy htlp://t.co/WxjlVuTMKc 


The second Twitter API allows programmatic interaction with Twitter. 
Whereas the streaming API is real time receive only, the second API provides 
the capability to perform POST and GET requests with the Twitter infrastructure. 
This architecture is called a RESTful architecture or representational state 
transfer (REST) architecture. [9] The RESTful API is what allows third-party 
applications to post pictures and status updates along with request historical data 
from the Twitter databases. 

This thesis used both APIs to conduct research. Queries to the Twitter 
database for users and their historical tweets were performed using the REST 
API. With users found, the streaming API was used to alert on potential real time 
and future events. 

1. Rate Limiting 

Twitter has several policies that govern how much data can be obtained 
from their servers from an application. These restrictions can vary depending on 
the API being used to access information. The first restriction is the one percent 
cap on the streaming API. This restriction states that a program can gain access 
to at most one percent of the real time Twitter feed. This research narrows the 
stream down to 5,000 users so most tweets make it through despite the 
restriction. Exceptions happen when there are large social events (New Year’s 
Eve, Super Bowl, etc.) where the 5,000 users may be tweeting so much that 
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some tweets are not streamed by the Twitter server through the API. However, 
these instances are rare. 

The second access restriction is query rate limiting. This policy is based 
on the notion that a program has a limited number of requests it can make in a 
fixed window of time. Rate limiting is fairly common with Internet services and 
was also seen when accessing the MonkeyLearn API discussed in Chapter III. 

Twitter uses 15-minute time windows and limits the number of queries that 
can be made in this window. The window starts with the first query made and the 
query limit is based on the information being requested. Table 1 shows the 
current Twitter rate limit restrictions. 
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Table 1. Twitter Rate Limit Restrictions 


Title 

Resource family 

Requests / 15-min window 

GET application/rateJimit status 

application 

180 

GET favorites/list 

favorites 

15 

GET followers/ids 

followers 

15 

GET followers/list 

followers 

15 

GET friends/ids 

friends 

15 

GET friends/list 

friends 

15 

GET friendships/show 

friendships 

180 

GET help/configuration 

help 

15 

GET help/languages 

help 

15 

GET help/privacy 

help 

15 

GET help/tos 

help 

15 

GET lists/list 

lists 

15 

GET lists/members 

lists 

180 

GET lists/members/show 

lists 

15 

GET lists/memberships 

lists 

15 

GET lists/ownerships 

lists 

15 

GET lists/show 

lists 

15 

GET lists/statuses 

lists 

180 

GET lists/subscribers 

lists 

180 

GET lists/subscribers/show 

lists 

15 

GET lists/subscriptions 

lists 

15 

GET search/tweets 

search 

180 

GET statuses/lookup 

statuses 

180 

GET statuses/oembed 

statuses 

180 

GET statuses/retweeters/ids 

statuses 

15 

GET statuses/retweets/:id 

statuses 

15 

GET statuses/show/:id 

statuses 

180 

GET statuses/user timeline 

statuses 

180 

GET trends/available 

trends 

15 

GET trends/closest 

trends 

15 

GET trends/place 

trends 

15 

GET users/lookup 

users 

180 

GET users/show 

users 

180 

GET users/suggestions 

users 

15 

GET users/suggestions/:slug 

users 

15 

GET users/suggestions/:slug/members 

users 

15 


From Dev.twitter.com. (n.d.). “Rate Limits: Chart | Twitter Deveiopers.” (2015). 
[Online]. Available: https://dev.twitter.com/rest/public/rate-limits. 
Accessed 12 Aug 2015]. 


Rate limiting presents a hurdle when doing data mining research using 
Twitter. It is important to ensure every request gets the most copious and high 
quality data back from the Twitter servers so that no requests are wasted. 


21 










































E. 


MACHINE LEARNING 


Machine learning is the premise that a machine—a computer—can score 
data it has never seen before based on scored data used to train it. Machine 
learning was used in this research to classify the user profiles and tweets. The 
current state-of-the-art machine learning model is the Support Vector Machine 
(SVM). 

Broadly speaking, an SVM model attempts to find the best splits in the 
training data that maximizes their differences and minimizes the crossover 
between them. With the training data split, it then tries to place new data on one 
side of the split and give a score to the likelihood that the data is placed on the 
correct side of the split [10]. A classic graphical representation of this is shown in 
Figure 14. 


Figure 14. Support Vector Machine Illustration 



The solid line represents the decision plane and the dashed lines represent the 
support vectors. After K. Huang, Machine Learning. Berlin: Springer, 2008, p. 25. 
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There are two schools of machine learning—unsupervised and 
supervised. Unsupervised does not have a training set but rather ingests data 
and attempts to split the data and group the splits into clusters. Supervised 
machine learning ingests a known learning dataset and tries to fit new data to the 
learning set to make a guess as to how the new data should be classified. Figure 
15 shows the flow of data to build and use a supervised learning model such as 
an SVM. 


Figure 15. Supervised Learning Model 



Expected 

Label 


From Astroml.org. (n.d.). “2. Machine learning 101: general concepts — Machine 
learning for astronomy with scikit-learn.” (2015). [Online]. Available: 
http://www.astroml.org/sklearn_tutorial/general_concepts.html#supervised- 
learning-model-fit-x-y. Accessed 25 Aug 2015. 
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F. UNSTRUCTURED AND NOISY DATA 

Mining Twitter and social media generated data in general is very difficult 
due to what is called unstructured and noisy data. Unstructured data is data that 
lacks organization. Noisy data is that which contains noise within the data being 
investigated. User generated data can contain noise in the form of misspellings, 
extra or missing characters, and needless or excessive punctuation. An example 
of unstructured and noisy data is the payload of an e-mail. While the e-mail itself 
has a well understood structure, what the user types in the payload or body of 
the e-mail has no structure. Twitter has instances of user generated unstructured 
and noisy data. The first is the user profile description. The only structure in the 
profile is a 160-character limit. The second instance of user generated 
unstructured noisy data is the tweet field, which is limited only by 140 characters. 

Unstructured and noisy data is difficult for data mining tools. The tools are 
built using known data and they score the known the data in order to build a 
model that scores the data being mined. When data is fed into the model, it is 
processed and scored. Unstructured and noisy data has the tendency to score 
low because it is very difficult to train the model to handle unstructured and noisy 
data without the model becoming too large and diluting what is being looked for. 


24 



III. METHOD 


A. APPROACHES 

We studied several different approaches for detecting tweets that may 
contain any operational military information. 

The first approach listens to the public Twitter stream and using simple 
keyword filtering combined with text classification to signal that military 
operational information had been detected. The advantage of this approach is 
that it can begin almost immediately if there is already a corpus in place to define 
the keywords and train the text classifier. The disadvantage is that the streaming 
API is limited to one percent of the streaming tweets on Twitter. 

With the first option, relevant tweets may be captured, but there may be a 
significant number of tweets that are missed because the streaming API is limited 
to a cap of one percent of the tweets. Another issue is that the keywords may be 
too limiting or not limiting enough to catch the relevant tweets. A constant theme 
with this research was filtering just the right amount of data to be relevant but not 
overwhelming to a human reviewer. 

With the second and third approaches, we first establish a set of users 
that may be associated with the Navy. With a list of users that appear to be in the 
Navy, the streaming filter would then become their user IDs. All of their tweets 
would be ingested and tweets that are classified as operational military 
information would trigger a detection alert. An aspect that makes this approach 
attractive is that the built user list has already undergone one level of scrutiny to 
attempt to validate them as military accounts. When their tweets trigger a 
detection alert that represents a second level of scrutiny. By the time an alert 
is issued, the information has been through two levels of scrutiny that attempt to 
cut down on false alarms—1) the tweet came from an apparent military user, and 
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2) the tweet contains pieces of text that are classified as military and have a time 
element to them. 

Based on this strategy, a second approach was to look at popular U.S. 
Navy accounts and analyze their friends and followers lists for accounts of 
interest. There are a lot of accounts that are associated with the Navy; a few of 
the most popular ones are shown in Figure 16. 

Figure 16. Popular U.S. Navy Associated Twitter Accounts 


■S Jonathan W. Greenert 

OCNOGreenert 

gpjj Naval Forces Europe 

■SB OUSNavyEurope 

TWEETS 

388 

TWEETS 

5,259 

FOLLOWING 

213 

FOLLOWING 

793 

FOLLOWERS 

7,869 

FOLLOWERS 

11.3K 

FAVORITES 

11 

FAVORITES 

548 

V U.S. Navy 

fUvV OUSNavy 

TWEETS 

18.6K 

FOLLOWING 

1,163 

FOLLOWERS 

489K 

FAVORITES 

109 

gg SECNAV Ray Mabus 

OSECNAV 

TWEETS 

1,513 

FOLLOWING 

55 

FOLLOWERS 

30.1 K 

FAVORITES 

17 

US FLEET FORCES 

OUSFLEETFORCES 

TWEETS 

3,589 

FOLLOWING 

104 

FOLLOWERS 

4,475 

FAVORITES 

1 

Destroyer Squadron 7 

ODESRON. 7 

TWEETS 

115 

FOLLOWING 

184 

FOLLOWERS 

266 

FAVORITES 

33 

^ ^ PERS-41 

©PERS41 

TWEETS 

450 

FOLLOWING 

7 

FOLLOWERS 

1,811 

FAVORITES 

2 


Shown is a sampling of Twitter accounts associated with the U.S. Navy. 
Note that the first four are verified accounts annotated by Twitter 
with a blue checkmark. 


The general U.S. Navy account is the most popular at nearly 500,000 
followers. Approach two is advantageous because that among the followers of 
these accounts, there is probably a high concentration of military users. The 
disadvantage is that blindly downloading the profiles and tweets of a half million 
users is very time consuming. 

The third approach that was explored and ultimately chosen was using 
publicly available military promotion lists as the seed for finding users on Twitter. 
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Specifically, the US Navy promotion lists were chosen. This approach is similar 
to approach two but the data being downloaded is from directed queries based 
on confirmed military member names. 

The Navy Personnel Command posts the promotion lists of all enlisted 
sailors ranks E-4 through E-9 and officers ranks 0-3 through 0-10 on their public 
website. No sign in or common access card (CAC) privilege is needed to access 
these lists. The lists are also commonly published on military websites and 
newspapers upon their release. 

B. GATHERING DATA 

It is important to note that no special privilege was used to access the 
Navy personnel lists. There was also no special privilege used to access Twitter. 
All information downloaded was publicly available, free of charge, and accessed 
according to their terms of service. The NPS Office of General Counsel and the 
Institutional Review Board determined that the data being collected by this 
method was public and therefore not considered to be humans subject research. 

The Twitter API ingests a string name as the parameter to search for user 
profiles. The Twitter server receives the name and returns possible results based 
on a combination of factors, including profile activity and name match. Twitter 
also looks for common variations of the name [11]. An example of this 
methodology is how the Twitter search users algorithm will treat the name “Tom.” 
The search algorithm will also search for “Thom,” “Thomas,” “Tomas,” etc. 

Each call to the API user search will return up to twenty possible 
candidates. The algorithm used for this is shown in the finder function in Figure 
17. It would make API calls up to five times per name thus having up to one 
hundred candidates per name. 
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Figure 17. Twitter User Search and Tweet Download Python Code 

irport tii-ccpy 
fror sets irport Set 

« keys and secrets are acquired fron dev.twitter.con 
auth ^ tweepy.OAuthHandlcr(CONSUMEft KEY, CCKSUMEft SECRET) 
auth.set_access_tokcn(ACCESS_KEY, ACCESS.SECRET) 

* define the API hardier 
api > tweepy.APKauthI 

* input file of names to lookup thru Twitter API 
input_narcs - openCnamefile").readlinesl) 

* output file to disk 

outfile - open("twitterdata","a") 

« list of candidates who have already been downloaded 
blacklist = Sct(II) 

» function rakes calls to Twitter API to find users given a name 
def finder(nanc): 

# initial API call that returns up to 2t candidates - the max allo«ed per call 
candidates - api.search_users(q-nane,per_page-28,pagei:l) 

» iterate to build the candidate list up to 188 candidates 
for i in 12,3,4,SI: 

candidates.appcnd(api.scarch_users(q-rame,per_pagc^ 28 ,pa 9 e:i)I 
return candidates 

* function to download candidate's tweets 
def getTw'cetsIcandidate): 

» define list to add tweets to 
all_tweets - II 

» initial API call that returns first 288 tweets - the max allowed per call 
new_twects - api.user_tineline(scrccn_nanc ■ candidate, count ■ 288) 

» add ncw_twects to all.tweets 
all_twccts.cxtcrd(ncw_twects) 

» define variable for oldest tweet downloaded 
oldest » all_tweetsl-ll.id - 1 

» iterate to build the all_twcets list up to 1888 tweets 
while len(all_tweets) < 881: 

new_tweets - api.Lser_tineline( screen_nane « candidate,\ 

count-288,\ 
nax_id-oldcstI 

all_twects.extend(ncw_twects) 
oldest - all_tweets|-ll.id - 1 
return all_tweets 

* main execution of the program 

if _name_ «= "_rain_": 

for name in input.nanes: 

candidates > finder(nane) 

for candidate in candidates: 

if candidate not in blacklist: 

outfile.write IgetTweets(candidate)) 


Note that manipulation of the tweet object before saving is omitted. 
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With the list of candidate user names, the next step was to download the 
profile and tweet information. The tweet object returns a nested user object that 
includes profile information so it is not necessary to search for the profile 
information apart from the tweets. 

It is possible to download the most recent 3,250 tweets per user at 
intervals of 200 tweets per request through the API [12]. A limit of the most 
recent 1,000 tweets was used. The algorithm to download the user tweets is 
shown in the getTweets function in Figure 17. The self imposed 1,000 tweet 
limit was partly due to the Twitter API rate limiting and also a general assumption 
was used that if a user did not identify as being military in their most recent 1,000 
tweets, they probably were not going to be worth following for posting operational 
military information. 

Two simultaneous Python sessions were used to download the user data 
from Twitter. Each promotion list was about one terabyte of information and the 
throughput from Twitter was between one and three megabytes per second. 

We used the fiscal year (FY) 14 and FY15 E-4 through E-6 promotion lists 
as the basis for the user search. These two lists collected include nearly 45,000 
names, and when passed to Twitter, 1.2 million unique Twitter user identities 
were returned. Running queries to retrieve the tweets of these users returned 
approximately 430 million tweets. Chapter IV will discuss the results of the 
queries in more detail. Data accessed through the API contains objects in 
JavaScript Object Notation (JSON). The data fields that were saved are shown 
and described in Table 2. 
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Table 2. Twitter Saved Data Fields 


Field Name 

Description 

name_searched_for 

The name from the promotion list that returned 
the unique ID to search 

tweet_user_id_searched_for 

The user ID that was returned from the promotion 
list name and passed back to Twitter to download 
tweets 

tweet_created_at_date 

The date the tweet was created 

tweet_created_at_time 

The time (UTC) that the tweet was created 

tweetjatitude 

The geolocated latitude of the tweet if 
geolocation is enabled on the account 

tweetjongitude 

The geolocated longitude of the tweet if 
geolocation is enabled on the account 

tweet_id 

The unique ID of the tweet as assigned by Twitter 

tweet_favorite_count 

The number of times the tweet has been 
“favorited” by Twitter users 

tweet_in_reply_to_screen_name 

If the tweet is in reply to another Twitter user, this 
field populates with the user’s screenname 

tweet_in_reply_to_status_id 

If the tweet is in reply to another Twitter user, this 
field populates with the user’s unique tweet ID 

tweet_author_id 

The user ID of the author of the tweet 

tweetjanguage 

The language of the tweet - self reported by the 
author 

tweet_retweet_count 

The number of times the tweet has been 
retweeted 

tweet_source 

The source of the tweet - ex. iPhone, Android, 
Web, etc. 

tweet_text 

The 140 character text generated by the author 
and commonly known as “the tweet” 

tweet_user_id 

The unique user ID of the author - saved as a 
check to the second field saved. The searched 
and returned user ID should always be the same 

tweet_user_description 

The user profile of the author 

tweet_user_created_at_date 

The date the user account was created 

tweet_user_created_at_time 

The time (UTC) that the user account was 
created 

tweet_user_followers_count 

The number of followers the account has 

tweet_user_friends_count 

The number friends the account has - this is also 
known as the accounts this account is following 

tweet_user_language 

The language of the account as reported by the 
user. 

tweet_user_location 

The self reported location of the user 

tweet_user_screen_name 

The screen name of the user 

tweet_user_verified 

Boolean value indicating if the user is verified - 
verified accounts are usually associated with high 
profile accounts to suppress the prominence of 
fraudulent accounts 

tweet_user_statuses_count 

The number of tweets the account has authored 

tweet_user_time_zone 

The time zone of the user 


30 





















































C. CLASSIFYING USERS 

Finding the users manually by reading their profiles and tweets would take 
too much time to be relevant. In order to speed up the process it was decided to 
use machine learning to read the data and make determinations of whether or 
not the profile or tweet was a military user. 

Classifying the users was accomplished by utilizing two approaches. The 
first approach uses the MonkeyLearn online text classification service and the 
second approach was building classifiers in Python. MonkeyLearn abstracts 
nearly all of the complication of machine learning from the user. The service 
allows the user to upload the datasets to train an algorithm and adjust some 
settings such as classifier type, stop words, n-gram range, and specific options 
for social media data. 

The MonkeyLearn is designed for someone new to machine learning and 
allows one to begin classifying text almost immediately. MonkeyLearn typically 
deals with customers that make at most several thousand queries per month. 
That model did not work for this research because of the volume of data 
downloaded from Twitter. To overcome this, a dedicated server was provided for 
one month with unlimited queries. This enabled all the profiles and tweets to be 
classified in about 23 days through their API. 

The MonkeyLearn API can ingest up to 500 texts per request for 
classification. There is also a rate limit of 30 requests per minute. The script 
written to interface with the API is shown in Figure 18. The script encoded the 
texts in JSON format and sent the JSON object as the payload of the API 
request. The API responds with a payload of the classified texts in JSON format. 
The response from the API is ingested, decoded from JSON, matched with the 
user ID of the text and saved to disk. 
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Figure 18. MonkeyLearn API Python Interface Code 

ijnport requests 
import json 

from time import sleep 
from sets import Set 
import CSV 

# output file 

writer = csv.writerlopenC'classif ied .profiles.csv","a")) 

# set to ensure no profiles are checked twice 
checked.ids = Set(()) 

# function to communicate with monkey learn and save data 
def classData(datalist, outputdata): 

try: 

data = {‘text.list': datalist} 
response = requests.post( 

"https://api.raonkeylearn.com/v2/classifiers/cl_idhere/classify/?", 
data=j son.dumps(data), 

headers={'Authorization': 'Token «« unique token ***', 
'Content-Type': 'application/json'>) 

results = json.loads(response.text)("result") 

i = 0 

for result in result: 

writer.writerowloutputdata[i] ,\ 

str(piece(0]("probability")),\ 
piece(0)("label")) 

i += 1 


» API is rate limted to 30 requests per minute 
except BaseException, e: 
print e 
sleep(l) 

classData(datalist, outputdata) 

# main execution of the program 
if _name_ = "_main_": 

# output_data is a list of user ids - this is used 

# when writing the output file 

# data.list is the data sent to the API to classify 

# the two lists have to be seperated while classifying then 

# joined back together when writing the file to disk 
output_data, data_list =0, 0 

for line in x: 

p = line.split)",") 

# build list of profiles to classify 

# only check ids that haven't been classified and have text 
if p(0) not in checked_ids and p(l)(:-l) != "": 

checked_ids.update)[p(0))) » add the current id to the checked_ids 

output_data.append(p[0)) # build list of current user ids 

data_list.append)p[l)) # build list of profiles 

# API limited to 500 text classifications per request 
if len)data_list) > 499: 

classData)data_list, output_data) 
data_list, output_data =0, 0 

# this statement is for the final profiles that didn't add up to 500 total 
if len)data_list) > 0: 

saveData(data_list, output_data) 

# close the file with classification information 
writer, close)) 
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The second approach was to build our own text classifiers for the user 
profiles and the user tweets. For this task, a Support Vector Machine (SVM) 
classifiers were built in Python. 

The SVM classifiers built by the author used the Scikit-learn machine 
learning libraries [13]. The code to build the classifiers is derived from [14] and is 
shown in Figure 19. The main functions used to build the classifiers include 

split_into_lemmas, the CountVectorizer, and the TfidTransformer. 
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Figure 19. Python Code to Build the Text Classifiers 

irport CSV 

fron textblob irport TcxtBlob 
inport pandas 
inport cPicklc 

fron sklcarn.fcaturc_extraction.text inport CountVectorizcr, TfidfTransfomcr 
fron sklcarn.svn inport SVC 

fron sklcarn.retries inport classification_rcport, confusion_natrix 

fron sklcarn.pipeline inport Pipeline 

fron sklcarn.grid.search inport GridScarchCV 

fron sklcarn.cross.validation inport train_tcst_split, StratificdKFold 
fron nltk.stcn.lancastcr inport LancasterStemer 

def split_into_lcnnas(ncssagc): 

r dealing with codecs 

nessage > str(ncssagc) 

nessage « unicode(nessagc, 'utf-8'l 

nessage ■ nessage.cncodc('unicodc.escape'I.lower!) 

» steming 

St - LancasterStemer!) 
nessage « message.split!" ") 
wds • (I 

for word in nessage: 

wds • wds « |st.stcn!word)l 
message « “ ".joinlwdsl 

words • Tcxt8lob!ncssagc).words 
» for each word, take its "base forn" ■ Icnna 
return Iword.lcnra for word in words) 

» main execution of the program 
if _name_ «» "_main_ 

nessages ■ pandas.read.csv!'profile.corpus.csv', sep-',', quotingwcsv.OUOTE_NON£, 
names*("nessage",“label"I) 

r build training and test sets 

nsg_train, nsg.tcst, label.train, labcl_tcst « \ 

train_tcst_split!nessages(‘nessage*I, nessages(‘label’), tcst_sizc-0.2) 

print len!nsg_train), Icninsg.tcst!, lcn!nsg_train) •» len!nsg_tcst) 

pipelinc.svn • Pipeline!( 

!‘bow’, CountVcctorizcrlanalyzcr*split_into_lcnnas)), 

!‘tfidf‘, TfidfTransfomcr!)), 

!‘classifier’, SVC!probability»Truel),)) 

» pipeline parameters to automatically explore and tunc 
paran.svn • ( 

{‘classifier C‘: (1, 10, 108, 580, 18081, ‘classifier kernel’: (‘linear’)), 

{‘classifier C‘: (1, 10, 188, 500, 1008), ‘classifier garna’: (8.081, 8.085, 0.8801),\ 

‘classifier_kernel’: (‘rbf’)),) 

grid.svn ■ GridScarchCV! 

pipclinc.svn, » pipeline fron above 

paran_grid-paran_svn, * parameters to tune via cross validation 
rcfit«Truc, * fit using all data, on the best detected classifier 
n_jobs«-l, # number of cores to use for parallelization; -1 for "all cores" 
scoring«‘accuracy‘, « what score are we optinizing? 
cv=StratificdKFoldllabcl.train, n.folds • 18),) 

» find the best combination fron param.svn 
svn.dctcctor « grid_svn.fit!nsg_train, labcl.train) 

print svn_dctcctor.grid_scorcs_ 

print confusion_natrix!labcl_test, svn_dctcctor.predict!nsg_tcst)) 
print classification.rcport!label.test, svn_detcctor.predict!nsg_test)) 

» store the span detector to disk after training 
with open!‘profile.svn.pkl’, ‘wb’) as filcout: 
cPickle.dunp!svn_dctcctor, fileout) 
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The spiit_into_iemmas function ingests the training set, tokenizes the 
words, and stems them. Stemming is the process of striping a word down to its 
base form [15]. The Countvectorizer ingests the result from splitting the 
words into lemmas and converts the terms into a matrix of token counts [16]. The 
TfidTransformer normalizes the words into the term-frequency inverse 
document-frequency (TF-IDF) representation. TF-IDF attempts to reduce the 
impact of words that appear often in the training set because in language, words 
that naturally appear often are generally not good at classifying the text in which 
they appear [17]. For example the word “the” appears very often in language but 
provides little to no value in classifying the text in which it appears. 

The program to build the classifier also explores different parameters 
defined by the user to tune the classifier. The program exploits all logical 
processing cores on the machine. The research utilized a machine with 24 logical 
2.4 GFIz processing cores. The profile classifier build took about 4 minutes and 
the tweet classifier build took about 15 minutes. When complete, the classifier is 
saved as a Pickle object so that it can be loaded as an object into classification 
programs without having to be rebuilt. 

Using both classification methods allowed for a “best of breed” approach 
when determining which classification results were more suited for the needs of 
the research. Combining the results from the profile and tweet classifiers, a list of 
potential military Twitter users was generated. 

D. DETECTION ALGORITHM 

Based on the user profile and tweet classification results, the streaming 
API was accessed using the Twitter user IDs as the only filter. The Twitter 
streaming API allows a filter of up to 5,000 users per session [5]. Of the 1.2 
million users that were originally found, the real time tweets of 30,000 users were 
ingested. Six streaming sessions were opened and maintained to ingest the 
tweets of the users. 
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As tweets were ingested, the words were tokenized and a simple 
algorithm was used to alert that a movement tweet may have been detected. 
Tweets that were retweets of other users were ignored—only first person tweets 
were examined by the algorithm. 

Two lists, or buckets, of tokens were used as the basis of the algorithm. 
The first bucket contained tokens associated with time such as tomorrow, week, 
month, etc. The second bucket contained military movement type words such as 
deployment, cruise, underway, etc. If a tweet contained at least one token from 
each bucket, an alert was issued. Table 3 shows the two buckets of tokens. 


Table 3. Detection Tokens 


1 Navy Tokens 

Temporal Tokens 

atlantic 

lant 

afternoon 

schedule 

boat 

mediterranean 

day 

scheduled 

bridge 

navy 

days 

someday 

centcom 

ocean 

early 

soon 

cruise 

pacific 

evening 

tardy 

deploy 

pacom 

late 

today 

deployed 

quarterdeck 

midnight 

tomorrow 

deploying 

ship 

months 

week 

deployment 

submarine 

morning 

weeks 

duty 

underway 

noon 

year 

eastpac 

fleet 

gulf 

watch 

westpac 

now 

yesterday 


The token approach was chosen because there was not enough data 
available to build a machine classifier that could accurately detect operational 
military information. With more time, this approach could build a training set for a 
machine classifier to identify operational military information. 

The detection alert would propagate on the terminal screen where all 
tweets were being displayed and it would also be issued as a text message to 
the author. This alert method is simple but is platform agnostic and demonstrated 

36 








the ease at which the detection alert could be issued cross platform to whatever 
platform was most convenient to the user—watch center, email, text message, or 
integration into another application. 

Figure 20 shows how the data flows from the user to issuing an alert. Text 
message was used as the alert mechanism here but it can easily be modified for 
the appropriate environment. 


Figure 20. Data Flow of Tweet Generation to Detection Alert 
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The code to perform the alerting uses the Tweepy Python library to access 
the streaming Twitter API and is shown in Figure 21. The Tweepy Stream 
instance connects with the Twitter streaming API and passes messages to a 
StreamListener class instance. Within the class, the on_data method receives 
the tweets [18]. Within on_data, the tweet object is decoded from JSON and four 
data fields are pulled out—the tweet time, the tweet user ID, the tweet ID, and 
the tweet text. The tweet is written to a file on disk and printed to the screen. 
Next, if the tweet is not a retweet, the words are transformed to lowercase and 
tokenized. Tweet token membership is checked against the temporal and military 
tokens. If the tweet contains at least one temporal token and one military token a 
detection alert is issued. Here, a text message is sent to the author using simple 
mail transfer protocol (SMTP). 
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Figure 21. Detection Algorithm Python Code 

fron twccpy irport Strcai, OAuthHardlcr 
fron twccpy.strcaiing inport StrcanListcncr 
irport jsor 
inport sntplib 
irport CSV 

# open the file of candidates - S66t rax 
candidates ■ opcnC'candidatcs.csv").rcadlinesO 

# open the file to write out to 

writer - csv.writcr(opcn("livc_twccts.csv“,'‘a'')) 

t define the tcrporal word bucket 

tcnporal.words « ( "today","tonorrow”,"week","nonth","year","soon","weeks", 

"afternoon","early","norning","evening","afternoon", 

"nidnight","noon","now","schedule","scheduled","ronths", 

"tardy","late","yesterday","soneday","day","days"l 

# define military word bucket 

nilitary_words ■ ( "navy","ship","duty","watch","underway","submarine","bridge", 

"deploynent","deploy","deployed","cruise","ocean","quarterdeck", 
"pacific","at Iantic", "mediterranean","gulf","boat","deploying"] 


» alert function 
def sendAlcrt(tweet): 

server ■ sntplib.SHTP('sntp.gnail.con;587') 
server.ehloO 
server.starttlsl) 

server.login('dcvbox^gnail.con*,'password') 

server.sendrailCdcvboxjgnail.CPI', '222S551212?vtcxt.cpn', tweet) 
server.quit() 

print "••• alert S0<T tweet 

return 


class listencr(StreanListcncr): 


def on_data(sclf, data): 


» fomat the data 
data • json.loads(data) 


» pull out data fields and fornat 
twcct_timc ■ datal"crcated_at") 

twcet_user_id ■ datai"uscr"n"id_str") 

twcct_id • data("id_str"] 

twcet_tcxt » data|"tcxt"| 


» write the data to the output file 

writcr.writerow((tweet_tine,tweet_user_id,twcet_id,tweet_textl) 

* print the data to the screen 
print proc,twcct_tine,"\t",\ 
twcct_user_id,"\t",\ 
twcct_id,"\t",\ 
tweet.text 


» exclude retweets and detcrninc tine - nilitary relationship and alert 
if twcct_textl:2l !• "RT": 

if anyCword in twcct_tcxt.lower)).split() for word in tcnporal_words): 
if any(word in twcct_tcxt.lower)).split() for word in nilitary_words) 
sendAlert) twcet_uscr_id*" "•»tweet_text) 

» keys and secrets acquired fron dev.twitter.con 
auth . OAuthHardler)CCNSl«ER_KEY, C))NS'JH£R_SECR£T) 
auth.sct_acccss_token)ACCESS_KEY, ACCESS.SECRET) 

t launch the streanmg API and filter based on user IDs 
twittcrStrcan - Strcan)auth, listener))) 
twittcrStrcan.filter)follow=ids) 


» close the file 
outfile.close)) 
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IV. FINDINGS 


The FY14 and FY15 enlisted E-4 through E-6 promotion lists combined 
consist of about 45,000 names. Queries to Twitter as described in the method 
section result in about 1.2 million unique profiles and nearly 430 million tweets 
from these users. The downloading algorithm downloaded about 1 terabyte of 
data from the Twitter servers for each promotion list. The throughput was 
between 2 and 3 megabytes per second and downloading the data took about a 
two months of round-the-clock operations on a dedicated machine. 

A. DATA OVERVIEW 

Several interesting statistical data points were produced from the results. 
These data points are shown in Table 4. 


Table 4. Data Statistics 


Names Searched 

44,889 

Unique Twitter Accounts Searched 

1,197,210 

Verified Users 

5,678 

Tweets Downloaded 

427,024,296 

ReTweets 

114,268,219 (26.76%) 

Accounts with Profile Data 

685,117 (57.23%) 

Tweets with Geolocation 

13,515,260 (3.17%) 

Tweets Geolocated in the US 

8,834,450 (2.0%) 

Oldest Tweet 

13 July 2006 

Newest Tweet 

24 April 2015 


The newest tweet is from the last day of the data download. 


Figure 22 gives a pictorial representation of the number of tweets 
downloaded for a given date. The tweets were downloaded during March and 
April of 2015. The most recent tweets were downloaded up to 1,000 tweets per 
user. It is clearly seen that the majority of tweets downloaded were created in the 
last two years. 
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Figure 22. Number of Tweets Created by Date 



The vertical axis is the number of tweets created and the horizontal axis is the 
date. The data point plotted represents the number of tweets created 

for a given date. 


Figure 23 shows the methods with which users post tweets. This shows 
that most users are tweeting using a mobile device such as iPhone or Android. 


Figure 23. Top 20 Sources of Tweets 


twitterfeed 

1530699 

Google 

1560968 

Mobile Web (M5) 

» 1590448 

Twitter for Mac 

1808513 

Foursquare 

1873345 

TweetCaster for Android 

) 2076971 

Twitter for Windows Phone 

> 2481758 

iOS 

) 2648107 

Cloudhopper 

( 2770065 

Hootsuite 

■ 3736137 

Echofon 

■ 3794675 

Twitter for Websites 

m 5651957 

TweetDeck 

■ 7323434 

Facebook 

m 8138987 

Twitter for iPad 

B 10971926 

Twitter for BlackBerry 

M 11037621 

Instagram 

M 12537808 

Twitter Web Client 





The top twenty tweet sources is a mix of mobile and web based platforms. 
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Lastly, the geolocated tweets can be plotted on a map as shown in Figure 
24. Doing so gives a picture of where people live, work, and travel. With only the 
continental United States border drawn, the interstate highway system and 
population centers become clearly visible. 

Figure 24. U.S. Geolocated Tweets Plotted 



B. CLASSIFIER ANALYSIS 

The next step after downloading all the data was to attempt to find the 
military users among the data. The data mining approach was two fold. First, 
analyze the user profiles for military members and second analyze the tweets for 
military tweets. 

As discussed in Chapter III, Support Vector Machine (SVM) classifiers 
were chosen to classify the profiles and tweets. The SVM approach required the 
building of a profile training set and a tweet training set. Building the training sets 
is a very tedious process but is necessary because it would take too long to 
manually read all the profiles and tweets. 
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The training set for the profiles consisted of 207 military profiles and about 
6,000 nonmilitary profiles. Finding military profiles manually to train the classifier 
proved to be a very difficult task. Thousands of profiles had to be manually read 
to find the 207 actual military profiles. The 6,000 nonmilitary profiles are a 
combination of validated nonmilitary profiles and the profiles of verified users. 
The training set is too large to publish, but a word cloud of the training set is 
shown in Figure 25. Note that the words are stemmed. 


Figure 25. Word Cloud of Profile Training Set 



The same training data was used for both the MonkeyLearn classifiers 
and the classifiers built in Python. When classifying user profiles, there was an 
86.5% overlap between the MonkeyLearn and Python classifiers. 

A confusion matrix is used to show the accuracy of a classifier and 
displays the intersection of actual classification by predicted classification. Both 
classifiers presented similar confusion matrices as shown in Figure 26. It can be 
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seen that all classifiers had a difficult time accurately differentiating between 
military and non-military profiles and tweets. 


Figure 26. Classifier Confusion Matrices 


MonkeyLearn 


Predicted 


Profiles 

Militarv 

Non Militarv 

MMItarv 

46 

6 

Non Militarv 

26 

1547 


ij# python 


Predicted 


Profiles 

Militarv 

Non Militarv 

Militarv 

33 

6 

Non Militarv 

8 

1253 



Predicted 


Tweets 

Patriotic 

Militarv 

Non Militarv 

Patriotic 

441 

0 

7 

Militarv 

0 

183 

89 

Non Militarv 

8 

42 

1730 


Predicted 


Tweets 

Patriotic 

Militarv 

Non Militarv 

Patriotic 

389 

1 

6 

Militarv 

2 

160 

86 

Non Militarv 

18 

26 

1560 


C. DATA ANALYSIS 

The tweets of nearly 1.2 million users were downloaded based on the 
45,000 names on the two enlisted promotion lists. Of these users, a little more 
than half (57%) offered some kind of information in their profile. The classifiers 
identified 8,107 (1.2%) as being military. This is a bit higher than the national 
average of military to civilians (0.5%) [19], [20] but in line with what was 
expected. 

According to Pew Research, about 1/3 of people aged 18-29 use Twitter 
[21]. The promotion lists used to find the accounts fall within the 18-29 
demographic. We can assume of the profiles without profile data, 1.2% are also 
military. This results in 6,145 profiles that are military without profile data for a 
total of 14,252 military profiles among the 1.2 million downloaded. If about 1/3 of 
our military users in the demographic have a Twitter account, we expect to see 
about 13,900 accounts among the 45,000 people on the two promotion lists. A 
word cloud of the found military profiles is shown in Figure 27. 
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Figure 27. Word Cloud of Found Military Profiles 


new 

peopi 


ole 
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mam 


ever 




thing chill 


■ , playy®^^^-,r-— 
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music 


live 

time 


best;§’famili 

’^Sackworld 

sport 


The next phase of classification was to classify the tweets the users 
generated. This was done in an attempt to find the military users who did not 
provide profile information. SVM classifiers were again used and a training set 
developed. A word cloud of the training set is shown in Figure 28. 

The training set consisted of three categories—military, nonmilitary, and 
patriotic. Patriotic was chosen as a classification to attempt to help the classifier 
due to patriotic and military tweets sounding very similar. It was intended that 
differentiating them in the training set would help the classifier identify the 
subtleties between them. In total, about 11,000 tweets were used to train the 
tweet classifier. The confusion matrices for the two classifiers are shown in 
Figure 26. 

Based on the tweet classification results, if the classifier determined that 

more than 40% of the user’s tweets were military, then that user was identified as 

a potential military candidate. The 40% threshold seems high, but it yields nearly 
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20,000 users who are unique compared to those identified from their profiles. 
The tweet classification is not as good as the profile classification so a wider net 
is required to try to account for the remaining 6,000 users. 


Figure 28. Word Cloud of Tweet Training Set 


Still 

final 


sailor make miss 

sailorgang 


just 


life navylif «„ 

school underway watCh see 

home good 1®* 

want iQpg __ 
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D. ALERTING 

The results of the detection alerting were very interesting. Within two days 
of launching the alerting service, several alerts were issued that either 
announced a movement event or confirmed the fact that the user was a military 
member as shown in Figure 29. Some alerts would confirm both and point to 
profiles that provided other valuable information about the user, their social 
connections, and their profession as show in Figure 30. The two buckets of 
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tokens despite being very rudimentary were effective in alerting that something 
significant was tweeted about. 


Figure 29. Tweets that Issued Alerts 



' Aug 

02-07 again, 4 out of the last 5 duty days 



I don't know if I'm more stoked about being 
in the navy for three years today, or that I'll 
be getting out in two years. 



Aug 15 

Today marks 3 years in the Navy. Another 8 months in Great Lakes 
then off... instagram.con 


V;4S.*-‘.' 1SAug2015 



Well deploying today now. Need to get in 
contact with me do it either through here of 
Facebook and I'll reply when I can.. 



Worst thing about deployment is everyone's 
attitude., the new people have no reason to get mad. 
They have done 1 deployment i've done 5 now 


1 


In 8 months I will go from bring Active Duty Military 
to becoming a full time college student. Life is great 


11:49AW •!5Aog20'i 



I Aug 10 


Duty days < 


These are examples of tweets that met the criteria for an alert. 
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Figure 30. Alerting Process with Candidate Confirmation 



Alert Issued for tweet 



A 


Returned List 
of Potential 
Candidates 

27266210 

51038873 

20S34308 

30S62494 

2S27S693 

87693923 

4243513S 

33233061 

46841060 

8880048 



A 


This is an example of a user who was being followed based on the ID returned 
from Twitter from a promotion list name. The user’s tweet of an upcoming 
deployment would not only alert an analyst of the event but also bring attention to 
the profile to confirm the user is the intended target candidate and point to other 
accounts the user has such as Facebook and Instagram. 


One reason for finding the users first, then using their tweets to alert was 
to attempt to keep the false alarm rate down. This assumption was proven 
correct. Along with the streaming sessions that used the user IDs as filters, a 
streaming session was used that only filtered on the two buckets of tokens and 
alerted based on the same criteria as the user ID filtered streaming sessions. 

The streaming sessions based on user IDs as the filter initially had a false 
alarm rate of about 50%. Once the tokens were modified to remove “watch” and 
“now” the false alarm rate dropped to about 20%. The user ID filtered streaming 
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sessions would issue about five detection alerts per day. The false alarm rate of 
the streaming session filtered on tokens only was estimated at about 99%. This 
is an estimate because the token filtered streaming session issued about 
500 detection alerts per hour and quickly overwhelmed the user. The first 
1,000 detection alerts were read and only two of them proved to be military 
users. 
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V. CONCLUSION AND FUTURE WORK 


A. CONCLUSION 

Time was the biggest limitation when conducting this research. The two 
most time-consuming tasks were collecting the data and classifying the data. 

Collecting the data required learning the intricacies of the Twitter APIs 
including their limitations and boundaries. Downloading the data also took a very 
long time—nearly two months. This was mostly due to rate limiting, as discussed 
in Chapter III, and the throughput from the Twitter servers that is available from 
the public APIs. 

Classifying the data also required learning new skills and required a lot of 
time to read thousands of profiles and tweets to build the training sets. Although 
the classifiers worked, they could have been better had the training sets been 
bigger. Their size was limited by not having enough time to manually read more 
profiles and tweets. The tweet classifier in particular lacked the training data to 
be highly effective. The machine classifiers also took a fair amount of time to 
complete the classification. The paid service took 23 days and the locally built 
classifier took 5 days using 24 processing cores. 

This research showed that it is possible using machine learning to 
automate the discovery of a population of users on Twitter that share a common 
interest. Here, the common interest was being in the U.S. Navy. With the users 
found, it was also shown that it is possible to ingest their tweets in real-time and 
present detection alerts based on combinations of keywords with a low false 
alarm rate. 

B. FUTURE WORK 

The classifiers built for this research worked well, but the noisy and 
unstructured nature of tweet text data caused many profiles and tweets to be 

incorrectly classified. The research presented here shows that automated 
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classification and alerting is possible. The training sets used for this research 
included hundreds of example target profiles and nearly one thousand example 
target tweets. Future work should focus on building a large and accurate training 
set of confirmed profiles and tweets that number into the several thousands. 

Future work should also investigate using profile and tweet classifiers 
against the real-time one percent public stream from the Twitter streaming API. 
The author attempted this to assess feasibility and it appeared possible. It was 
found that real-time profile and tweet classification is possible; however, using a 
single processor for classification was too slow in classifying the stream, causing 
a backlog and hence not classifying in real time. The API also drops the 
connection when the backlog gets too big. A project attempting real-time 
classification of the Twitter stream will need to utilize multiprocessing to keep 
pace with the tweets as they are transferred by the API to the machine. 

The Navy could also benefit from using this research as a foundation to 
build a tool that unit commanders could use to gauge their unit’s exposure on 
social media platforms. The tool would build a database of user profiles attaching 
information to the database such as unit, rank, and rate as the user releases it on 
social media platforms and as it is released officially by the military through 
mediums such as promotion lists. The tool would then display to unit 
commanders how much their people are mentioning unit information in the form 
of a visual tool like a heat map, such as Figure 31. Commanders could then use 
this tool to make adjustments to their schedule if they feel their operational 
security has been compromised. 
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Figure 31. Example Commander’s Schedule Heat Map 


SUNDAY 

MONDAY 

TUESDAY 

WEDNESDAY 

THURSDAY 

FRIDAY 

SATURDAY 

UNDERWAY 

UNDERWAY 

UNDERWAY 

UNDERWAY 

RETURN TO 
PORT 

IN PORT 

IN PORT 


LEGEND 

LOW SOCIAL MEDIA PRESENCE 
MODERATE SOCIAL MEDIA PRESENCE 
HIGH SOCIAL MEDIA PRESENCE 

This figure shows an example of a unit commander’s weekly social media 
presence heat map along with a sample unit schedule for the week. In this 
example, the commander may want to change the schedule to return to port on 

Friday because it appears the current schedule has been compromised on social 

media platforms. 

Sentiment analysis is also an emerging area of data science. 
Commanders may benefit from having a tool that has many of the same 
characteristics of the tool discussed in the previous paragraph but adds a social 
media sentiment analysis capability. Knowing unit sentiment would allow 
commanders to adjust the unit working environment to maximize job satisfaction 
and productivity. 

C. RECOMMENDATIONS 

The military should consider a more secure method of notifying members 
of their promotion status. The current method of publicly posting the promotion 
lists on the Internet presents a security risk to the personnel and the units they 
are assigned to. Besides the examples presented, there were several “high 
value” personnel found through the method presented in Chapter III. For 
example, one member found was a sailor who works in the reactor department 
on an aircraft carrier, and has geolocation enabled on his tweets. 
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It appears that Navy Personal Command began posting official naval 
messages online around the year 2000, as shown in Figure 32. Before this, naval 
messages were sent from Navy Personal Command directly to the units. The 
messages are unclassified and posting them online is convenient for 
commanders and sailors but it is also convenient for adversaries. Posting the 
names of military members openly on the Internet makes finding their social 
media accounts too easy for nefarious actors. 


Figure 32. Navy Personnel Command Online Messages 



Enlisted 


Navy Personnel Command 


Boards Career Info Officer 


Support & Services Organization 


Reference Library 


Navy Personnel Command > Reference Library > Messages > ALNAVS > ALNAV 2015 


ALNAVS 

ALNAV 2015 
ALNAV 2014 
ALNAV 2013 
ALNAV 2012 
ALNAV 2011 
ALNAV 2010 
ALNAV 2009 
ALNAV 2006 
ALNAV 2007 
ALNAV 2006 
ALNAV 2005 
ALNAV 2004 
ALNAV 2003 
ALNAV 2002 
ALNAV 2001 
ALNAV 2000 
NAVAOMINS 


ALNAV 2015 


Message 

# 

Subject 

Date 

063/15 

ESTABLISHMENT OF THE NAVY BASIC MILITARY TRAINING HONOR 

B/16/2015 

GRADUATE RIBBON 

062/15 

CHANGES TO THE GLOBAL WAR ON TERRORISM EXPEDITIONARY 

7/31/2015 

MEDAL 

061/15 

NEW REQUIREMENT TO CONSIDER A VICTIMS PREFERENCE FOR 

7/31/201S 

PROSECUTION BY COURT-MARTIAL OR CIVILIAN COURT 


060/15 

FY-16 REAR ADMIRAL (LOWER HALF) LINE SELECTION 

7/27/2015 


059/15 

FY-16 ACTIVE-DUTY NAVY LIEUTENANT COMMANDER STAFF CORPS 

7/27/2015 

SELECTIONS 

058/15 

INCLUSION OF SEXUAL ORIENTATION AS A BASIS FOR 

7/23/2015 

DISCRIMINATION 

057/15 

2015 FEDS FEED FAMILIES ANNUAL FOOD DRIVE 

7/20/2015 


Shown is The Navy Personal Command All Navy Messages website. In view are 
several promotion announcement messages that contain the name and rank of 
military members. On the left, an archive dating back to 2000 is available. 


Military members in the Navy currently access their personnel records, 
professional data, and training through CAC secured websites hosted by the U.S. 
government. The Navy should consider designing a capability where each 
member could be notified of their promotion status on one of these CAC secured 
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websites. This would be a major course correction for the way Navy Personnel 
Command conducts their business but the current information age requires that 
information be treated differently than it has been treated in the past. 

Publicly releasing information on the Internet based solely on the fact that 
it is unclassified is negligent behavior. Information that is unclassified should not 
warrant a blanket public release on the Internet. Military leadership should 
assess whether there is a need for the information to be released and weigh that 
need against the risk of an adversary using that information in a way that could 
be harmful to the personnel and their units. 
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